Each trial has an input phase (left) and a test phase (right). During the input phase the speaker directly names objects from a particular category. In the test phase, new objects from the same category are shown but the speaker ambiguously asks for “it”. Main question: Do children pick the object from the same category.
Setup for child experiment. Input trial on the left and test trial on the right. Children heard pre-recorded utterances.
I ran Bayesian GLMMs fit via brms. We include random effects for subject and item (speaker) and random slopes for condition whenever applicable. I used model comparisons as the main way of doing inference about age and condition effects. I used the following indicators (following Richard McElreath’s book).
WAIC score: WAIC is an information criterion which guards against overfitting the data by penalizing by the number of parameters in the model. It is taken to be an indicator of how good a model’s out of sample predictions are. The lower, the better.
WAIC weights: A model’s weight is an estimate of the probability that the model will make the best predictions on new data, conditional on the set of models considered. These sum up to 1.
After finding the “winning model”, I use Bayes Factors to compare the winning model to alternative models without the predictors in question and also look at the predictors directly within the model to check the direction (positive vs. negative) and whether the CI overlaps with 0.
We have adult data (MTurk) for Experiment 1 and 3. For Experiment 2, we have data that tests the effect of amount of input but with different conditions. Adult experiments were not pre-registered, mainly because we used them to find the best procedure. If we were to include adults in the paper, we might want to consider running them again with a pre-registration. Here, they are only included in the plots.
Children received 6 input trials before the test trials. Main question was whether they pick the object from the same category.
We tested 2,3 and 4yo. We added 2yo later and also increased the sample size for them because we expected a smaller effect. Children received 4 trials in a single condition.
| age_group | n |
|---|---|
| 2 | 30 |
| 3 | 21 |
| 4 | 20 |
We bin data by age and use a one-sample Bayesian t-test to compare performance to chance. The table below shows Bayes Factors (BF) for each age group.
| age_group | mean | BF |
|---|---|---|
| 2 | 0.417 | 0.592 |
| 3 | 0.595 | 90.772 |
| 4 | 0.550 | 10.392 |
3 and 4yo seem to make the basic inference while the evidence is not that strong for 2yo. Because of that, we did not run 2yo in subsequent experiments.
Here we use a Bayesian GLMM to look at the effect of age. For inference we compare it to a model without age as a predictor.
The table below shows WAIC scores and weights for each model (RE = random effects, same for each model). The column BF_age_model shows the Bayes Factor for the model comparison between the full model (with age in that case) and each reduced model (here: without age)
| model | WAIC | SE | weight | BF_age_model |
|---|---|---|---|---|
| model_w_age: age_num + RE | 386.92 | 7.67 | 0.6 | - |
| null_model: 1 + RE | 387.71 | 7.09 | 0.4 | 1.66 |
The model comparison favors the model with age as predictor. This is broadly in line with the analysis binned by age. However, the effect of age is not too strong: The BF in favor of the model with age is not that high and the predictor for age in the model overlaps with 0 (see plot below).
Posterior distribution for model fixed effects. Point indicates posterior mean, thick line shows 50%CI and thin line shows 95% CI.
Here we varied the number of input children received before each test trial. They either heard the speaker name six objects from the same category (high input) or just one (low input). Main question was whether their performance drops with lower input.
The sample size in this study was rather small because age effects were not our major focus. We were mainly interested in the effect of condition. Children received again 4 trials, 2 in each condition.
| age_group | n |
|---|---|
| 3 | 18 |
| 4 | 15 |
The table below shows WAIC scores and weights for each model. The column BF_int_model shows the Bayes Factor for the model comparison between the full model (interaction model) and each reduced model (here: with main effect of condition or null model without condition).
| model | WAIC | SE | weight | BF_int_model |
|---|---|---|---|---|
| model_w_interaction: condition * age_num + RE | 177.93 | 11.07 | 0.22 | - |
| model_w_condition: condition + age_num + RE | 177.16 | 9.98 | 0.33 | 4.35 |
| null_model: age_num + RE | 176.53 | 9.27 | 0.45 | 10.54 |
The model comparison favors the null model with only age as predictor. (Note, however, that the Bayes Factor favors the interaction model). Condition seems not to affect children’s performance. Overall, adding a second condition seems to have made the general task harder for younger children. This is also reflected in the reliably positive effect of age in the null model (plotted below).
Posterior distribution for model fixed effects. Point indicates posterior mean, thick line shows 50%CI and thin line shows 95% CI.
Here we tested whether children make speaker specific inferences. That is, whether they see the topic of a conversation as specific to a particular speaker. Children received 6 input trials with one speaker, then the speaker left the scene and either the same or a different speaker returned. At test, the speaker always asked for “it”.
Here we again tested a larger sample to also look at age effects. Children received 4 trials, 2 in each condition.
| age_group | n |
|---|---|
| 3 | 30 |
| 4 | 30 |
The table below shows WAIC scores and weights for each model. The column BF_int_model shows the Bayes Factor for the model comparison between the full model (interaction model) and each reduced model (here: with main effect of condition or null model without condition).
| model | WAIC | SE | weight | BF_int_model |
|---|---|---|---|---|
| model_w_interaction: condition * age_num + RE | 325.18 | 10.44 | 0.60 | - |
| model_w_condition: condition + age_num + RE | 327.67 | 9.49 | 0.17 | 27.4 |
| null_model: age_num + RE | 327.09 | 9.02 | 0.23 | 79.09 |
The model comparison favors the interaction model. Bayes Factors also suggest that this model fits the data better compared to the other models. When looking at the model predictors, we see a positive interaction effect, mirroring what we see in the graphs above, namely that younger children do not distinguish between the two conditions, but older children do.
Posterior distribution for model fixed effects. Point indicates posterior mean, thick line shows 50%CI and thin line shows 95% CI.
Children make inferences about what the general “topic” of a discourse is and use this to identify the referent of an ambiguous utterance. The amount of input they receive seems to have no direct effect on this inference. Older children treat the discourse topic as something that is specific to a particular speaker.